Multimodal Video Concept Detection via Bag of Auditory Words and Multiple Kernel Learning
نویسندگان
چکیده
State-of-the-art systems for video concept detection mainly rely on visual features. Some previous approaches have also included audio features, either using low-level features such as mel-frequency cepstral coefficients (MFCC) or exploiting the detection of specific audio concepts. In this paper, we investigate a bag of auditory words (BoAW) approach that models MFCC features in an auditory vocabulary. The resulting BoAW features are combined with state-of-the-art visual features via multiple kernel learning (MKL). Experiments on a large set of 101 video concepts from the MediaMill Challenge show the effectiveness of using BoAW features: The system using BoAW features and a support vector machine with a χ-kernel is superior to a state-of-the-art audio approach relying on probabilistic latent semantic indexing. Furthermore, it is shown that an early fusion approach degrades detection performance, whereas the combination of auditory and visual bag of words features via MKL yields a relative performance improvement of 9%.
منابع مشابه
The MediaMill TRECVID 2009 Semantic Video Search Engine
In this paper we describe our TRECVID 2009 video retrieval experiments. The MediaMill team participated in three tasks: concept detection, automatic search, and interactive search. Starting point for the MediaMill concept detection approach is our top-performing bag-of-words system of last year, which uses multiple color descriptors, codebooks with soft-assignment, and kernel-based supervised l...
متن کاملEnhancing Recognition of Visual Concepts with Primitive Color Histograms via Non-sparse Multiple Kernel Learning
In order to achieve good performance in image annotation tasks, it is necessary to combine information from various image features. In recent competitions on photo annotation, many groups employed the bag-of-words (BoW) representations based on the SIFT descriptors over various color channels. In fact, it has been observed that adding other less informative features to the standard BoW degrades...
متن کاملIranian EFL Learners L2 Reading Comprehension: The Effect of Online Annotations via Interactive White Boards
This study explores the effect of online annotations via Interactive White Boards (IWBs) on reading comprehension of Iranian EFL learners. To this aim, 60 students from a language institute were selected as homogeneous based on their performance on Oxford Placement Test (2014).Then, they were randomly assigned to 3 experimental groups of 20, and subsequently exposed to the research treatment af...
متن کاملThe MediaMill TRECVID 2008 Semantic Video Search Engine
In this paper we describe our TRECVID 2008 video retrieval experiments. The MediaMill team participated in three tasks: concept detection, automatic search, and interactive search. Rather than continuing to increase the number of concept detectors available for retrieval, our TRECVID 2008 experiments focus on increasing the robustness of a small set of detectors using a bag-of-words approach. T...
متن کاملFusing appearance and distribution information of interest points for action recognition
Most of the existing action recognition methods represent actions as bags of space-time interest points. Specifically, space-time interest points are detected from the video and described using appearancebased descriptors. Each descriptor is then classified as a video-word and a histogram of these videowords is used for recognition. These methods therefore rely solely on the discriminative powe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012